Model compression via distillation and quantization
نویسندگان
چکیده
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep models in resource-constrained environments, such as mobile or embedded devices. This paper focuses on this problem, and proposes two new compression methods, which jointly leverage weight quantization and distillation of larger networks, called “teachers,” into compressed “student” networks. The first method we propose is called quantized distillation and leverages distillation during the training process, by incorporating distillation loss, expressed with respect to the teacher network, into the training of a smaller student network whose weights are quantized to a limited set of levels. The second method, differentiable quantization, optimizes the location of quantization points through stochastic gradient descent, to better fit the behavior of the teacher model. We validate both methods through experiments on convolutional and recurrent architectures. We show that quantized shallow students can reach similar accuracy levels to state-of-the-art full-precision teacher models, while providing up to order of magnitude compression, and inference speedup that is almost linear in the depth reduction. In sum, our results enable DNNs for resource-constrained environments to leverage architecture and accuracy advances developed on more powerful devices.
منابع مشابه
Concentration of Colourful Wild Berry Fruit Juices by Membrane Osmotic Distillation via Cascade Model Systems
Fresh juices of colourful wild berries: cornelian cherry, blackthorn, white beam and elderberry are considered as valuable, highly nutritive beverages and characterized by the high level of vitamins and antioxidant capacity. The concentration process of these juices by membrane osmotic distillation was studied, where only water vapour is eliminated, while the heat sensitive, valuable compounds ...
متن کاملFast and Accurate Single Image Super-Resolution via Information Distillation Network
Recently, deep convolutional neural networks (CNNs) have been demonstrated remarkable progress on single image super-resolution. However, as the depth and width of the networks increase, CNN-based super-resolution methods have been faced with the challenges of computational complexity and memory consumption in practice. In order to solve the above questions, we propose a deep but compact convol...
متن کاملModel Distillation with Knowledge Transfer in Face Classification, Alignment and Verification
Knowledge distillation is a potential solution for model compression. The idea is to make a small student network imitate the target of a large teacher network, then the student network can be competitive to the teacher one. Most previous studies focus on model distillation in the classification task, where they propose different architects and initializations for the student network. However, ...
متن کاملAdaptive Quantization for Deep Neural Network
In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large memory consumption, which may not be affordable for mobile platforms. Deep model quantization can be used for reducing the computation and memory costs of DNNs...
متن کاملOptimal Neural Net Compression via Constrained Optimization
Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. Firstly, we give a general formulation of model compression as constrained optimization. This makes the problem of model compression well defined and amenable to the use of modern numerical optimization...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.05668 شماره
صفحات -
تاریخ انتشار 2018